ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks 20 ...
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks 20 ...
Stacked Cross Attention for Image-Text Matching 2020-03-06 15:13:08 Paper: https://arxiv.org/pdf/1 ...
Visual Semantic Reasoning for Image-Text Matching 2020-03-06 15:17:02 Paper: https://arxiv.org/ ...